{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "da6e0538",
   "metadata": {},
   "source": [
    "# python基础入门10_ Pandas与数据表\n",
    "\n",
    "Pandas 是 Python 语言的一个扩展程序库，用于数据分析。Pandas 名字衍生自术语 \"panel data\"（面板数据）和 \"Python data analysis\"（Python 数据分析）。Pandas 是一个开放源码、BSD 许可的库，提供高性能、易于使用的数据结构和数据分析工具。\n",
    "\n",
    "\n",
    "### 安装方法\n",
    "\n",
    "启动anaconda prompt在窗口中输入下载指令安装 pandas库：\n",
    "\n",
    "pip install pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30a60ee9",
   "metadata": {},
   "source": [
    "## DataFrame\n",
    "\n",
    "\n",
    "DataFrame 是 Pandas 中最常用的数据结构，它是一个表格型的二维数据结构，可以包含多个DataFrame 是 Pandas 中最常用的数据结构，它是一个表格型的二维数据结构，可以包含多个列和行，其中每列可以是不同的值类型。例如，你可以使用列表、ndarrays、字典等方式创建 DataFrame。此外，你也可以使用 loc 或者 iloc 属性来返回指定行或列的数据。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ff71beb",
   "metadata": {},
   "source": [
    "## 导入pandas库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "id": "6cd80677",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e38170c",
   "metadata": {},
   "source": [
    "## np.array -> dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "id": "026deef6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,\n",
       "       17, 18, 19, 20, 21, 22, 23, 24])"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr = np.arange(0,25)\n",
    "arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "id": "7cdd82a7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     0\n",
       "0    0\n",
       "1    1\n",
       "2    2\n",
       "3    3\n",
       "4    4\n",
       "5    5\n",
       "6    6\n",
       "7    7\n",
       "8    8\n",
       "9    9\n",
       "10  10\n",
       "11  11\n",
       "12  12\n",
       "13  13\n",
       "14  14\n",
       "15  15\n",
       "16  16\n",
       "17  17\n",
       "18  18\n",
       "19  19\n",
       "20  20\n",
       "21  21\n",
       "22  22\n",
       "23  23\n",
       "24  24"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(arr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "id": "b8d473df",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(25,)"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "id": "4568ccaa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3,  4],\n",
       "       [ 5,  6,  7,  8,  9],\n",
       "       [10, 11, 12, 13, 14],\n",
       "       [15, 16, 17, 18, 19],\n",
       "       [20, 21, 22, 23, 24]])"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr=arr.reshape(5,5)\n",
    "arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "id": "d01fa9fa",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>15</td>\n",
       "      <td>16</td>\n",
       "      <td>17</td>\n",
       "      <td>18</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20</td>\n",
       "      <td>21</td>\n",
       "      <td>22</td>\n",
       "      <td>23</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2   3   4\n",
       "0   0   1   2   3   4\n",
       "1   5   6   7   8   9\n",
       "2  10  11  12  13  14\n",
       "3  15  16  17  18  19\n",
       "4  20  21  22  23  24"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(arr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "id": "92a30671",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Column1</th>\n",
       "      <th>Column2</th>\n",
       "      <th>Column3</th>\n",
       "      <th>Column4</th>\n",
       "      <th>Column5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Row1</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Row2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Row3</th>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Row4</th>\n",
       "      <td>15</td>\n",
       "      <td>16</td>\n",
       "      <td>17</td>\n",
       "      <td>18</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Row5</th>\n",
       "      <td>20</td>\n",
       "      <td>21</td>\n",
       "      <td>22</td>\n",
       "      <td>23</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Column1  Column2  Column3  Column4  Column5\n",
       "Row1        0        1        2        3        4\n",
       "Row2        5        6        7        8        9\n",
       "Row3       10       11       12       13       14\n",
       "Row4       15       16       17       18       19\n",
       "Row5       20       21       22       23       24"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Add indexs and columns\n",
    "\n",
    "pd.DataFrame(arr,index=['Row1','Row2','Row3','Row4','Row5'],columns=[\"Column1\",\"Column2\",\"Column3\",\"Column4\",\"Column5\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "id": "d33f7235",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(arr,index=['Row1','Row2','Row3','Row4','Row5'],columns=[\"Column1\",\"Column2\",\"Column3\",\"Column4\",\"Column5\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "id": "f856fc5e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Column1</th>\n",
       "      <th>Column2</th>\n",
       "      <th>Column3</th>\n",
       "      <th>Column4</th>\n",
       "      <th>Column5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Row1</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Row2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Row3</th>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Row4</th>\n",
       "      <td>15</td>\n",
       "      <td>16</td>\n",
       "      <td>17</td>\n",
       "      <td>18</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Row5</th>\n",
       "      <td>20</td>\n",
       "      <td>21</td>\n",
       "      <td>22</td>\n",
       "      <td>23</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Column1  Column2  Column3  Column4  Column5\n",
       "Row1        0        1        2        3        4\n",
       "Row2        5        6        7        8        9\n",
       "Row3       10       11       12       13       14\n",
       "Row4       15       16       17       18       19\n",
       "Row5       20       21       22       23       24"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "id": "8aecf53f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "id": "8cfd1da7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Column1',\n",
       " 'Column2',\n",
       " 'Column3',\n",
       " 'Column4',\n",
       " 'Column5',\n",
       " 'T',\n",
       " '_AXIS_LEN',\n",
       " '_AXIS_ORDERS',\n",
       " '_AXIS_REVERSED',\n",
       " '_AXIS_TO_AXIS_NUMBER',\n",
       " '_HANDLED_TYPES',\n",
       " '__abs__',\n",
       " '__add__',\n",
       " '__and__',\n",
       " '__annotations__',\n",
       " '__array__',\n",
       " '__array_priority__',\n",
       " '__array_ufunc__',\n",
       " '__array_wrap__',\n",
       " '__bool__',\n",
       " '__class__',\n",
       " '__contains__',\n",
       " '__copy__',\n",
       " '__deepcopy__',\n",
       " '__delattr__',\n",
       " '__delitem__',\n",
       " '__dict__',\n",
       " '__dir__',\n",
       " '__divmod__',\n",
       " '__doc__',\n",
       " '__eq__',\n",
       " '__finalize__',\n",
       " '__floordiv__',\n",
       " '__format__',\n",
       " '__ge__',\n",
       " '__getattr__',\n",
       " '__getattribute__',\n",
       " '__getitem__',\n",
       " '__getstate__',\n",
       " '__gt__',\n",
       " '__hash__',\n",
       " '__iadd__',\n",
       " '__iand__',\n",
       " '__ifloordiv__',\n",
       " '__imod__',\n",
       " '__imul__',\n",
       " '__init__',\n",
       " '__init_subclass__',\n",
       " '__invert__',\n",
       " '__ior__',\n",
       " '__ipow__',\n",
       " '__isub__',\n",
       " '__iter__',\n",
       " '__itruediv__',\n",
       " '__ixor__',\n",
       " '__le__',\n",
       " '__len__',\n",
       " '__lt__',\n",
       " '__matmul__',\n",
       " '__mod__',\n",
       " '__module__',\n",
       " '__mul__',\n",
       " '__ne__',\n",
       " '__neg__',\n",
       " '__new__',\n",
       " '__nonzero__',\n",
       " '__or__',\n",
       " '__pos__',\n",
       " '__pow__',\n",
       " '__radd__',\n",
       " '__rand__',\n",
       " '__rdivmod__',\n",
       " '__reduce__',\n",
       " '__reduce_ex__',\n",
       " '__repr__',\n",
       " '__rfloordiv__',\n",
       " '__rmatmul__',\n",
       " '__rmod__',\n",
       " '__rmul__',\n",
       " '__ror__',\n",
       " '__round__',\n",
       " '__rpow__',\n",
       " '__rsub__',\n",
       " '__rtruediv__',\n",
       " '__rxor__',\n",
       " '__setattr__',\n",
       " '__setitem__',\n",
       " '__setstate__',\n",
       " '__sizeof__',\n",
       " '__str__',\n",
       " '__sub__',\n",
       " '__subclasshook__',\n",
       " '__truediv__',\n",
       " '__weakref__',\n",
       " '__xor__',\n",
       " '_accessors',\n",
       " '_accum_func',\n",
       " '_add_numeric_operations',\n",
       " '_agg_by_level',\n",
       " '_agg_examples_doc',\n",
       " '_agg_summary_and_see_also_doc',\n",
       " '_align_frame',\n",
       " '_align_series',\n",
       " '_arith_method',\n",
       " '_as_manager',\n",
       " '_attrs',\n",
       " '_box_col_values',\n",
       " '_can_fast_transpose',\n",
       " '_check_inplace_and_allows_duplicate_labels',\n",
       " '_check_inplace_setting',\n",
       " '_check_is_chained_assignment_possible',\n",
       " '_check_label_or_level_ambiguity',\n",
       " '_check_setitem_copy',\n",
       " '_clear_item_cache',\n",
       " '_clip_with_one_bound',\n",
       " '_clip_with_scalar',\n",
       " '_cmp_method',\n",
       " '_combine_frame',\n",
       " '_consolidate',\n",
       " '_consolidate_inplace',\n",
       " '_construct_axes_dict',\n",
       " '_construct_axes_from_arguments',\n",
       " '_construct_result',\n",
       " '_constructor',\n",
       " '_constructor_sliced',\n",
       " '_convert',\n",
       " '_count_level',\n",
       " '_data',\n",
       " '_dir_additions',\n",
       " '_dir_deletions',\n",
       " '_dispatch_frame_op',\n",
       " '_drop_axis',\n",
       " '_drop_labels_or_levels',\n",
       " '_ensure_valid_index',\n",
       " '_find_valid_index',\n",
       " '_flags',\n",
       " '_from_arrays',\n",
       " '_from_mgr',\n",
       " '_get_agg_axis',\n",
       " '_get_axis',\n",
       " '_get_axis_name',\n",
       " '_get_axis_number',\n",
       " '_get_axis_resolvers',\n",
       " '_get_block_manager_axis',\n",
       " '_get_bool_data',\n",
       " '_get_cleaned_column_resolvers',\n",
       " '_get_column_array',\n",
       " '_get_index_resolvers',\n",
       " '_get_item_cache',\n",
       " '_get_label_or_level_values',\n",
       " '_get_numeric_data',\n",
       " '_get_value',\n",
       " '_getitem_bool_array',\n",
       " '_getitem_multilevel',\n",
       " '_gotitem',\n",
       " '_hidden_attrs',\n",
       " '_indexed_same',\n",
       " '_info_axis',\n",
       " '_info_axis_name',\n",
       " '_info_axis_number',\n",
       " '_info_repr',\n",
       " '_init_mgr',\n",
       " '_inplace_method',\n",
       " '_internal_names',\n",
       " '_internal_names_set',\n",
       " '_is_copy',\n",
       " '_is_homogeneous_type',\n",
       " '_is_label_or_level_reference',\n",
       " '_is_label_reference',\n",
       " '_is_level_reference',\n",
       " '_is_mixed_type',\n",
       " '_is_view',\n",
       " '_iset_item',\n",
       " '_iset_item_mgr',\n",
       " '_iset_not_inplace',\n",
       " '_item_cache',\n",
       " '_iter_column_arrays',\n",
       " '_ixs',\n",
       " '_join_compat',\n",
       " '_logical_func',\n",
       " '_logical_method',\n",
       " '_maybe_cache_changed',\n",
       " '_maybe_update_cacher',\n",
       " '_metadata',\n",
       " '_mgr',\n",
       " '_min_count_stat_function',\n",
       " '_needs_reindex_multi',\n",
       " '_protect_consolidate',\n",
       " '_reduce',\n",
       " '_reindex_axes',\n",
       " '_reindex_columns',\n",
       " '_reindex_index',\n",
       " '_reindex_multi',\n",
       " '_reindex_with_indexers',\n",
       " '_replace_columnwise',\n",
       " '_repr_data_resource_',\n",
       " '_repr_fits_horizontal_',\n",
       " '_repr_fits_vertical_',\n",
       " '_repr_html_',\n",
       " '_repr_latex_',\n",
       " '_reset_cache',\n",
       " '_reset_cacher',\n",
       " '_sanitize_column',\n",
       " '_series',\n",
       " '_set_axis',\n",
       " '_set_axis_name',\n",
       " '_set_axis_nocheck',\n",
       " '_set_is_copy',\n",
       " '_set_item',\n",
       " '_set_item_frame_value',\n",
       " '_set_item_mgr',\n",
       " '_set_value',\n",
       " '_setitem_array',\n",
       " '_setitem_frame',\n",
       " '_setitem_slice',\n",
       " '_slice',\n",
       " '_stat_axis',\n",
       " '_stat_axis_name',\n",
       " '_stat_axis_number',\n",
       " '_stat_function',\n",
       " '_stat_function_ddof',\n",
       " '_take_with_is_copy',\n",
       " '_to_dict_of_blocks',\n",
       " '_typ',\n",
       " '_update_inplace',\n",
       " '_validate_dtype',\n",
       " '_values',\n",
       " '_where',\n",
       " 'abs',\n",
       " 'add',\n",
       " 'add_prefix',\n",
       " 'add_suffix',\n",
       " 'agg',\n",
       " 'aggregate',\n",
       " 'align',\n",
       " 'all',\n",
       " 'any',\n",
       " 'append',\n",
       " 'apply',\n",
       " 'applymap',\n",
       " 'asfreq',\n",
       " 'asof',\n",
       " 'assign',\n",
       " 'astype',\n",
       " 'at',\n",
       " 'at_time',\n",
       " 'attrs',\n",
       " 'axes',\n",
       " 'backfill',\n",
       " 'between_time',\n",
       " 'bfill',\n",
       " 'bool',\n",
       " 'boxplot',\n",
       " 'clip',\n",
       " 'columns',\n",
       " 'combine',\n",
       " 'combine_first',\n",
       " 'compare',\n",
       " 'convert_dtypes',\n",
       " 'copy',\n",
       " 'corr',\n",
       " 'corrwith',\n",
       " 'count',\n",
       " 'cov',\n",
       " 'cummax',\n",
       " 'cummin',\n",
       " 'cumprod',\n",
       " 'cumsum',\n",
       " 'describe',\n",
       " 'diff',\n",
       " 'div',\n",
       " 'divide',\n",
       " 'dot',\n",
       " 'drop',\n",
       " 'drop_duplicates',\n",
       " 'droplevel',\n",
       " 'dropna',\n",
       " 'dtypes',\n",
       " 'duplicated',\n",
       " 'empty',\n",
       " 'eq',\n",
       " 'equals',\n",
       " 'eval',\n",
       " 'ewm',\n",
       " 'expanding',\n",
       " 'explode',\n",
       " 'ffill',\n",
       " 'fillna',\n",
       " 'filter',\n",
       " 'first',\n",
       " 'first_valid_index',\n",
       " 'flags',\n",
       " 'floordiv',\n",
       " 'from_dict',\n",
       " 'from_records',\n",
       " 'ge',\n",
       " 'get',\n",
       " 'groupby',\n",
       " 'gt',\n",
       " 'head',\n",
       " 'hist',\n",
       " 'iat',\n",
       " 'idxmax',\n",
       " 'idxmin',\n",
       " 'iloc',\n",
       " 'index',\n",
       " 'infer_objects',\n",
       " 'info',\n",
       " 'insert',\n",
       " 'interpolate',\n",
       " 'isin',\n",
       " 'isna',\n",
       " 'isnull',\n",
       " 'items',\n",
       " 'iteritems',\n",
       " 'iterrows',\n",
       " 'itertuples',\n",
       " 'join',\n",
       " 'keys',\n",
       " 'kurt',\n",
       " 'kurtosis',\n",
       " 'last',\n",
       " 'last_valid_index',\n",
       " 'le',\n",
       " 'loc',\n",
       " 'lookup',\n",
       " 'lt',\n",
       " 'mad',\n",
       " 'mask',\n",
       " 'max',\n",
       " 'mean',\n",
       " 'median',\n",
       " 'melt',\n",
       " 'memory_usage',\n",
       " 'merge',\n",
       " 'min',\n",
       " 'mod',\n",
       " 'mode',\n",
       " 'mul',\n",
       " 'multiply',\n",
       " 'ndim',\n",
       " 'ne',\n",
       " 'nlargest',\n",
       " 'notna',\n",
       " 'notnull',\n",
       " 'nsmallest',\n",
       " 'nunique',\n",
       " 'pad',\n",
       " 'pct_change',\n",
       " 'pipe',\n",
       " 'pivot',\n",
       " 'pivot_table',\n",
       " 'plot',\n",
       " 'pop',\n",
       " 'pow',\n",
       " 'prod',\n",
       " 'product',\n",
       " 'quantile',\n",
       " 'query',\n",
       " 'radd',\n",
       " 'rank',\n",
       " 'rdiv',\n",
       " 'reindex',\n",
       " 'reindex_like',\n",
       " 'rename',\n",
       " 'rename_axis',\n",
       " 'reorder_levels',\n",
       " 'replace',\n",
       " 'resample',\n",
       " 'reset_index',\n",
       " 'rfloordiv',\n",
       " 'rmod',\n",
       " 'rmul',\n",
       " 'rolling',\n",
       " 'round',\n",
       " 'rpow',\n",
       " 'rsub',\n",
       " 'rtruediv',\n",
       " 'sample',\n",
       " 'select_dtypes',\n",
       " 'sem',\n",
       " 'set_axis',\n",
       " 'set_flags',\n",
       " 'set_index',\n",
       " 'shape',\n",
       " 'shift',\n",
       " 'size',\n",
       " 'skew',\n",
       " 'slice_shift',\n",
       " 'sort_index',\n",
       " 'sort_values',\n",
       " 'squeeze',\n",
       " 'stack',\n",
       " 'std',\n",
       " 'style',\n",
       " 'sub',\n",
       " 'subtract',\n",
       " 'sum',\n",
       " 'swapaxes',\n",
       " 'swaplevel',\n",
       " 'tail',\n",
       " 'take',\n",
       " 'to_clipboard',\n",
       " 'to_csv',\n",
       " 'to_dict',\n",
       " 'to_excel',\n",
       " 'to_feather',\n",
       " 'to_gbq',\n",
       " 'to_hdf',\n",
       " 'to_html',\n",
       " 'to_json',\n",
       " 'to_latex',\n",
       " 'to_markdown',\n",
       " 'to_numpy',\n",
       " 'to_parquet',\n",
       " 'to_period',\n",
       " 'to_pickle',\n",
       " 'to_records',\n",
       " 'to_sql',\n",
       " 'to_stata',\n",
       " 'to_string',\n",
       " 'to_timestamp',\n",
       " 'to_xarray',\n",
       " 'to_xml',\n",
       " 'transform',\n",
       " 'transpose',\n",
       " 'truediv',\n",
       " 'truncate',\n",
       " 'tz_convert',\n",
       " 'tz_localize',\n",
       " 'unstack',\n",
       " 'update',\n",
       " 'value_counts',\n",
       " 'values',\n",
       " 'var',\n",
       " 'where',\n",
       " 'xs']"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dir(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53d1a20f",
   "metadata": {},
   "source": [
    "## dicts --> dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "id": "de704084",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = {\n",
    "    'name':['apple','egg','watermelon'],\n",
    "    'color':['red','yellow','green'],\n",
    "    'num':[30,40,50]\n",
    "}\n",
    "df = pd.DataFrame(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "id": "4beef56d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>color</th>\n",
       "      <th>num</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>apple</td>\n",
       "      <td>red</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>egg</td>\n",
       "      <td>yellow</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>watermelon</td>\n",
       "      <td>green</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         name   color  num\n",
       "0       apple     red   30\n",
       "1         egg  yellow   40\n",
       "2  watermelon   green   50"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "id": "a4e31129",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(data,index=[\"1\",\"2\",\"3\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "id": "0dff8b4d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>color</th>\n",
       "      <th>num</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>apple</td>\n",
       "      <td>red</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>egg</td>\n",
       "      <td>yellow</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>watermelon</td>\n",
       "      <td>green</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         name   color  num\n",
       "1       apple     red   30\n",
       "2         egg  yellow   40\n",
       "3  watermelon   green   50"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "id": "e3a7d649",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2, 3]"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "index=list(range(1,4))\n",
    "index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "id": "45cab61b",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(data,index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "id": "9a037510",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>color</th>\n",
       "      <th>num</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>apple</td>\n",
       "      <td>red</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>egg</td>\n",
       "      <td>yellow</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>watermelon</td>\n",
       "      <td>green</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         name   color  num\n",
       "1       apple     red   30\n",
       "2         egg  yellow   40\n",
       "3  watermelon   green   50"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "id": "9fc87fb6",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = {'name':['apple','egg','watermelon'],'color':['red','yellow','green'],'num':[30,40,50]}\n",
    "df = pd.DataFrame(data, index=['a', 'b', 'c'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "id": "8fa3cd3b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>color</th>\n",
       "      <th>num</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>apple</td>\n",
       "      <td>red</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>egg</td>\n",
       "      <td>yellow</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>watermelon</td>\n",
       "      <td>green</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         name   color  num\n",
       "a       apple     red   30\n",
       "b         egg  yellow   40\n",
       "c  watermelon   green   50"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a521735d",
   "metadata": {},
   "source": [
    "## 读数据head()、tail() & 取数据loc()、iloc()\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "319cfc4f",
   "metadata": {},
   "source": [
    "#### head()、tail()\n",
    "head:Return the first `n` rows.\n",
    "\n",
    "tail:Return the last `n` rows.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 189,
   "id": "bac47314",
   "metadata": {},
   "outputs": [],
   "source": [
    "arr=np.arange(0,81).reshape(9,9)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 190,
   "id": "c1e36015",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],\n",
       "       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],\n",
       "       [18, 19, 20, 21, 22, 23, 24, 25, 26],\n",
       "       [27, 28, 29, 30, 31, 32, 33, 34, 35],\n",
       "       [36, 37, 38, 39, 40, 41, 42, 43, 44],\n",
       "       [45, 46, 47, 48, 49, 50, 51, 52, 53],\n",
       "       [54, 55, 56, 57, 58, 59, 60, 61, 62],\n",
       "       [63, 64, 65, 66, 67, 68, 69, 70, 71],\n",
       "       [72, 73, 74, 75, 76, 77, 78, 79, 80]])"
      ]
     },
     "execution_count": 190,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 191,
   "id": "9356ca86",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(arr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 192,
   "id": "02c9f9cd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "      <td>16</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18</td>\n",
       "      <td>19</td>\n",
       "      <td>20</td>\n",
       "      <td>21</td>\n",
       "      <td>22</td>\n",
       "      <td>23</td>\n",
       "      <td>24</td>\n",
       "      <td>25</td>\n",
       "      <td>26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>27</td>\n",
       "      <td>28</td>\n",
       "      <td>29</td>\n",
       "      <td>30</td>\n",
       "      <td>31</td>\n",
       "      <td>32</td>\n",
       "      <td>33</td>\n",
       "      <td>34</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>36</td>\n",
       "      <td>37</td>\n",
       "      <td>38</td>\n",
       "      <td>39</td>\n",
       "      <td>40</td>\n",
       "      <td>41</td>\n",
       "      <td>42</td>\n",
       "      <td>43</td>\n",
       "      <td>44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>45</td>\n",
       "      <td>46</td>\n",
       "      <td>47</td>\n",
       "      <td>48</td>\n",
       "      <td>49</td>\n",
       "      <td>50</td>\n",
       "      <td>51</td>\n",
       "      <td>52</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>54</td>\n",
       "      <td>55</td>\n",
       "      <td>56</td>\n",
       "      <td>57</td>\n",
       "      <td>58</td>\n",
       "      <td>59</td>\n",
       "      <td>60</td>\n",
       "      <td>61</td>\n",
       "      <td>62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>63</td>\n",
       "      <td>64</td>\n",
       "      <td>65</td>\n",
       "      <td>66</td>\n",
       "      <td>67</td>\n",
       "      <td>68</td>\n",
       "      <td>69</td>\n",
       "      <td>70</td>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>72</td>\n",
       "      <td>73</td>\n",
       "      <td>74</td>\n",
       "      <td>75</td>\n",
       "      <td>76</td>\n",
       "      <td>77</td>\n",
       "      <td>78</td>\n",
       "      <td>79</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2   3   4   5   6   7   8\n",
       "0   0   1   2   3   4   5   6   7   8\n",
       "1   9  10  11  12  13  14  15  16  17\n",
       "2  18  19  20  21  22  23  24  25  26\n",
       "3  27  28  29  30  31  32  33  34  35\n",
       "4  36  37  38  39  40  41  42  43  44\n",
       "5  45  46  47  48  49  50  51  52  53\n",
       "6  54  55  56  57  58  59  60  61  62\n",
       "7  63  64  65  66  67  68  69  70  71\n",
       "8  72  73  74  75  76  77  78  79  80"
      ]
     },
     "execution_count": 192,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 193,
   "id": "13e73f6b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "      <td>16</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18</td>\n",
       "      <td>19</td>\n",
       "      <td>20</td>\n",
       "      <td>21</td>\n",
       "      <td>22</td>\n",
       "      <td>23</td>\n",
       "      <td>24</td>\n",
       "      <td>25</td>\n",
       "      <td>26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>27</td>\n",
       "      <td>28</td>\n",
       "      <td>29</td>\n",
       "      <td>30</td>\n",
       "      <td>31</td>\n",
       "      <td>32</td>\n",
       "      <td>33</td>\n",
       "      <td>34</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>36</td>\n",
       "      <td>37</td>\n",
       "      <td>38</td>\n",
       "      <td>39</td>\n",
       "      <td>40</td>\n",
       "      <td>41</td>\n",
       "      <td>42</td>\n",
       "      <td>43</td>\n",
       "      <td>44</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2   3   4   5   6   7   8\n",
       "0   0   1   2   3   4   5   6   7   8\n",
       "1   9  10  11  12  13  14  15  16  17\n",
       "2  18  19  20  21  22  23  24  25  26\n",
       "3  27  28  29  30  31  32  33  34  35\n",
       "4  36  37  38  39  40  41  42  43  44"
      ]
     },
     "execution_count": 193,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()##读表头"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 194,
   "id": "62ac160d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "      <td>16</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18</td>\n",
       "      <td>19</td>\n",
       "      <td>20</td>\n",
       "      <td>21</td>\n",
       "      <td>22</td>\n",
       "      <td>23</td>\n",
       "      <td>24</td>\n",
       "      <td>25</td>\n",
       "      <td>26</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2   3   4   5   6   7   8\n",
       "0   0   1   2   3   4   5   6   7   8\n",
       "1   9  10  11  12  13  14  15  16  17\n",
       "2  18  19  20  21  22  23  24  25  26"
      ]
     },
     "execution_count": 194,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "id": "de431a54",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>36</td>\n",
       "      <td>37</td>\n",
       "      <td>38</td>\n",
       "      <td>39</td>\n",
       "      <td>40</td>\n",
       "      <td>41</td>\n",
       "      <td>42</td>\n",
       "      <td>43</td>\n",
       "      <td>44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>45</td>\n",
       "      <td>46</td>\n",
       "      <td>47</td>\n",
       "      <td>48</td>\n",
       "      <td>49</td>\n",
       "      <td>50</td>\n",
       "      <td>51</td>\n",
       "      <td>52</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>54</td>\n",
       "      <td>55</td>\n",
       "      <td>56</td>\n",
       "      <td>57</td>\n",
       "      <td>58</td>\n",
       "      <td>59</td>\n",
       "      <td>60</td>\n",
       "      <td>61</td>\n",
       "      <td>62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>63</td>\n",
       "      <td>64</td>\n",
       "      <td>65</td>\n",
       "      <td>66</td>\n",
       "      <td>67</td>\n",
       "      <td>68</td>\n",
       "      <td>69</td>\n",
       "      <td>70</td>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>72</td>\n",
       "      <td>73</td>\n",
       "      <td>74</td>\n",
       "      <td>75</td>\n",
       "      <td>76</td>\n",
       "      <td>77</td>\n",
       "      <td>78</td>\n",
       "      <td>79</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2   3   4   5   6   7   8\n",
       "4  36  37  38  39  40  41  42  43  44\n",
       "5  45  46  47  48  49  50  51  52  53\n",
       "6  54  55  56  57  58  59  60  61  62\n",
       "7  63  64  65  66  67  68  69  70  71\n",
       "8  72  73  74  75  76  77  78  79  80"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail()#读表尾"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "id": "0180fb27",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>63</td>\n",
       "      <td>64</td>\n",
       "      <td>65</td>\n",
       "      <td>66</td>\n",
       "      <td>67</td>\n",
       "      <td>68</td>\n",
       "      <td>69</td>\n",
       "      <td>70</td>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>72</td>\n",
       "      <td>73</td>\n",
       "      <td>74</td>\n",
       "      <td>75</td>\n",
       "      <td>76</td>\n",
       "      <td>77</td>\n",
       "      <td>78</td>\n",
       "      <td>79</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2   3   4   5   6   7   8\n",
       "7  63  64  65  66  67  68  69  70  71\n",
       "8  72  73  74  75  76  77  78  79  80"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7560d2e2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "120cf36c",
   "metadata": {},
   "source": [
    "#### Using loc or iloc to access the elements\n",
    "#loc(): Access a group of rows and columns by label(s) or a boolean array.\n",
    "\n",
    "#iloc(): Purely integer-location based indexing for selection by position."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 195,
   "id": "63ed2301",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>C1</th>\n",
       "      <th>C2</th>\n",
       "      <th>C3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>R1</th>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R2</th>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R3</th>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R4</th>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R5</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    C1  C2  C3\n",
       "R1   1   6  11\n",
       "R2   2   7  12\n",
       "R3   3   8  13\n",
       "R4   4   9  14\n",
       "R5   5  10  15"
      ]
     },
     "execution_count": 195,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 创建一个DataFrame\n",
    "data = {'C1': [1, 2, 3, 4, 5], 'C2': [6, 7, 8, 9, 10], 'C3': [11, 12, 13, 14, 15]}\n",
    "df = pd.DataFrame(data,index=['R1','R2','R3','R4','R5'])\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "id": "cf654bc7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "C1     1\n",
       "C2     6\n",
       "C3    11\n",
       "Name: R1, dtype: int64"
      ]
     },
     "execution_count": 156,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[\"R1\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "id": "ed5dcf5e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "R1     6\n",
       "R2     7\n",
       "R3     8\n",
       "R4     9\n",
       "R5    10\n",
       "Name: C2, dtype: int64"
      ]
     },
     "execution_count": 161,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[:,\"C2\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "id": "15093e13",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>C1</th>\n",
       "      <th>C2</th>\n",
       "      <th>C3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>R1</th>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R2</th>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R3</th>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R4</th>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R5</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    C1  C2  C3\n",
       "R1   1   6  11\n",
       "R2   2   7  12\n",
       "R3   3   8  13\n",
       "R4   4   9  14\n",
       "R5   5  10  15"
      ]
     },
     "execution_count": 163,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[:,:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "id": "9aff45e9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>C2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>R1</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R2</th>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R3</th>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R4</th>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R5</th>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    C2\n",
       "R1   6\n",
       "R2   7\n",
       "R3   8\n",
       "R4   9\n",
       "R5  10"
      ]
     },
     "execution_count": 164,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[:,1:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "id": "7292172e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "C1    5\n",
       "C2    5\n",
       "C3    5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 165,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "id": "7ecb9adf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 167,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"C1\"].count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "id": "f1194c81",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    1\n",
       "2    1\n",
       "3    1\n",
       "4    1\n",
       "5    1\n",
       "Name: C1, dtype: int64"
      ]
     },
     "execution_count": 172,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"C1\"].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "id": "78707e43",
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['C1',\n",
       " 'C2',\n",
       " 'C3',\n",
       " 'T',\n",
       " '_AXIS_LEN',\n",
       " '_AXIS_ORDERS',\n",
       " '_AXIS_REVERSED',\n",
       " '_AXIS_TO_AXIS_NUMBER',\n",
       " '_HANDLED_TYPES',\n",
       " '__abs__',\n",
       " '__add__',\n",
       " '__and__',\n",
       " '__annotations__',\n",
       " '__array__',\n",
       " '__array_priority__',\n",
       " '__array_ufunc__',\n",
       " '__array_wrap__',\n",
       " '__bool__',\n",
       " '__class__',\n",
       " '__contains__',\n",
       " '__copy__',\n",
       " '__deepcopy__',\n",
       " '__delattr__',\n",
       " '__delitem__',\n",
       " '__dict__',\n",
       " '__dir__',\n",
       " '__divmod__',\n",
       " '__doc__',\n",
       " '__eq__',\n",
       " '__finalize__',\n",
       " '__floordiv__',\n",
       " '__format__',\n",
       " '__ge__',\n",
       " '__getattr__',\n",
       " '__getattribute__',\n",
       " '__getitem__',\n",
       " '__getstate__',\n",
       " '__gt__',\n",
       " '__hash__',\n",
       " '__iadd__',\n",
       " '__iand__',\n",
       " '__ifloordiv__',\n",
       " '__imod__',\n",
       " '__imul__',\n",
       " '__init__',\n",
       " '__init_subclass__',\n",
       " '__invert__',\n",
       " '__ior__',\n",
       " '__ipow__',\n",
       " '__isub__',\n",
       " '__iter__',\n",
       " '__itruediv__',\n",
       " '__ixor__',\n",
       " '__le__',\n",
       " '__len__',\n",
       " '__lt__',\n",
       " '__matmul__',\n",
       " '__mod__',\n",
       " '__module__',\n",
       " '__mul__',\n",
       " '__ne__',\n",
       " '__neg__',\n",
       " '__new__',\n",
       " '__nonzero__',\n",
       " '__or__',\n",
       " '__pos__',\n",
       " '__pow__',\n",
       " '__radd__',\n",
       " '__rand__',\n",
       " '__rdivmod__',\n",
       " '__reduce__',\n",
       " '__reduce_ex__',\n",
       " '__repr__',\n",
       " '__rfloordiv__',\n",
       " '__rmatmul__',\n",
       " '__rmod__',\n",
       " '__rmul__',\n",
       " '__ror__',\n",
       " '__round__',\n",
       " '__rpow__',\n",
       " '__rsub__',\n",
       " '__rtruediv__',\n",
       " '__rxor__',\n",
       " '__setattr__',\n",
       " '__setitem__',\n",
       " '__setstate__',\n",
       " '__sizeof__',\n",
       " '__str__',\n",
       " '__sub__',\n",
       " '__subclasshook__',\n",
       " '__truediv__',\n",
       " '__weakref__',\n",
       " '__xor__',\n",
       " '_accessors',\n",
       " '_accum_func',\n",
       " '_add_numeric_operations',\n",
       " '_agg_by_level',\n",
       " '_agg_examples_doc',\n",
       " '_agg_summary_and_see_also_doc',\n",
       " '_align_frame',\n",
       " '_align_series',\n",
       " '_arith_method',\n",
       " '_as_manager',\n",
       " '_attrs',\n",
       " '_box_col_values',\n",
       " '_can_fast_transpose',\n",
       " '_check_inplace_and_allows_duplicate_labels',\n",
       " '_check_inplace_setting',\n",
       " '_check_is_chained_assignment_possible',\n",
       " '_check_label_or_level_ambiguity',\n",
       " '_check_setitem_copy',\n",
       " '_clear_item_cache',\n",
       " '_clip_with_one_bound',\n",
       " '_clip_with_scalar',\n",
       " '_cmp_method',\n",
       " '_combine_frame',\n",
       " '_consolidate',\n",
       " '_consolidate_inplace',\n",
       " '_construct_axes_dict',\n",
       " '_construct_axes_from_arguments',\n",
       " '_construct_result',\n",
       " '_constructor',\n",
       " '_constructor_sliced',\n",
       " '_convert',\n",
       " '_count_level',\n",
       " '_data',\n",
       " '_dir_additions',\n",
       " '_dir_deletions',\n",
       " '_dispatch_frame_op',\n",
       " '_drop_axis',\n",
       " '_drop_labels_or_levels',\n",
       " '_ensure_valid_index',\n",
       " '_find_valid_index',\n",
       " '_flags',\n",
       " '_from_arrays',\n",
       " '_from_mgr',\n",
       " '_get_agg_axis',\n",
       " '_get_axis',\n",
       " '_get_axis_name',\n",
       " '_get_axis_number',\n",
       " '_get_axis_resolvers',\n",
       " '_get_block_manager_axis',\n",
       " '_get_bool_data',\n",
       " '_get_cleaned_column_resolvers',\n",
       " '_get_column_array',\n",
       " '_get_index_resolvers',\n",
       " '_get_item_cache',\n",
       " '_get_label_or_level_values',\n",
       " '_get_numeric_data',\n",
       " '_get_value',\n",
       " '_getitem_bool_array',\n",
       " '_getitem_multilevel',\n",
       " '_gotitem',\n",
       " '_hidden_attrs',\n",
       " '_indexed_same',\n",
       " '_info_axis',\n",
       " '_info_axis_name',\n",
       " '_info_axis_number',\n",
       " '_info_repr',\n",
       " '_init_mgr',\n",
       " '_inplace_method',\n",
       " '_internal_names',\n",
       " '_internal_names_set',\n",
       " '_is_copy',\n",
       " '_is_homogeneous_type',\n",
       " '_is_label_or_level_reference',\n",
       " '_is_label_reference',\n",
       " '_is_level_reference',\n",
       " '_is_mixed_type',\n",
       " '_is_view',\n",
       " '_iset_item',\n",
       " '_iset_item_mgr',\n",
       " '_iset_not_inplace',\n",
       " '_item_cache',\n",
       " '_iter_column_arrays',\n",
       " '_ixs',\n",
       " '_join_compat',\n",
       " '_logical_func',\n",
       " '_logical_method',\n",
       " '_maybe_cache_changed',\n",
       " '_maybe_update_cacher',\n",
       " '_metadata',\n",
       " '_mgr',\n",
       " '_min_count_stat_function',\n",
       " '_needs_reindex_multi',\n",
       " '_protect_consolidate',\n",
       " '_reduce',\n",
       " '_reindex_axes',\n",
       " '_reindex_columns',\n",
       " '_reindex_index',\n",
       " '_reindex_multi',\n",
       " '_reindex_with_indexers',\n",
       " '_replace_columnwise',\n",
       " '_repr_data_resource_',\n",
       " '_repr_fits_horizontal_',\n",
       " '_repr_fits_vertical_',\n",
       " '_repr_html_',\n",
       " '_repr_latex_',\n",
       " '_reset_cache',\n",
       " '_reset_cacher',\n",
       " '_sanitize_column',\n",
       " '_series',\n",
       " '_set_axis',\n",
       " '_set_axis_name',\n",
       " '_set_axis_nocheck',\n",
       " '_set_is_copy',\n",
       " '_set_item',\n",
       " '_set_item_frame_value',\n",
       " '_set_item_mgr',\n",
       " '_set_value',\n",
       " '_setitem_array',\n",
       " '_setitem_frame',\n",
       " '_setitem_slice',\n",
       " '_slice',\n",
       " '_stat_axis',\n",
       " '_stat_axis_name',\n",
       " '_stat_axis_number',\n",
       " '_stat_function',\n",
       " '_stat_function_ddof',\n",
       " '_take_with_is_copy',\n",
       " '_to_dict_of_blocks',\n",
       " '_typ',\n",
       " '_update_inplace',\n",
       " '_validate_dtype',\n",
       " '_values',\n",
       " '_where',\n",
       " 'abs',\n",
       " 'add',\n",
       " 'add_prefix',\n",
       " 'add_suffix',\n",
       " 'agg',\n",
       " 'aggregate',\n",
       " 'align',\n",
       " 'all',\n",
       " 'any',\n",
       " 'append',\n",
       " 'apply',\n",
       " 'applymap',\n",
       " 'asfreq',\n",
       " 'asof',\n",
       " 'assign',\n",
       " 'astype',\n",
       " 'at',\n",
       " 'at_time',\n",
       " 'attrs',\n",
       " 'axes',\n",
       " 'backfill',\n",
       " 'between_time',\n",
       " 'bfill',\n",
       " 'bool',\n",
       " 'boxplot',\n",
       " 'clip',\n",
       " 'columns',\n",
       " 'combine',\n",
       " 'combine_first',\n",
       " 'compare',\n",
       " 'convert_dtypes',\n",
       " 'copy',\n",
       " 'corr',\n",
       " 'corrwith',\n",
       " 'count',\n",
       " 'cov',\n",
       " 'cummax',\n",
       " 'cummin',\n",
       " 'cumprod',\n",
       " 'cumsum',\n",
       " 'describe',\n",
       " 'diff',\n",
       " 'div',\n",
       " 'divide',\n",
       " 'dot',\n",
       " 'drop',\n",
       " 'drop_duplicates',\n",
       " 'droplevel',\n",
       " 'dropna',\n",
       " 'dtypes',\n",
       " 'duplicated',\n",
       " 'empty',\n",
       " 'eq',\n",
       " 'equals',\n",
       " 'eval',\n",
       " 'ewm',\n",
       " 'expanding',\n",
       " 'explode',\n",
       " 'ffill',\n",
       " 'fillna',\n",
       " 'filter',\n",
       " 'first',\n",
       " 'first_valid_index',\n",
       " 'flags',\n",
       " 'floordiv',\n",
       " 'from_dict',\n",
       " 'from_records',\n",
       " 'ge',\n",
       " 'get',\n",
       " 'groupby',\n",
       " 'gt',\n",
       " 'head',\n",
       " 'hist',\n",
       " 'iat',\n",
       " 'idxmax',\n",
       " 'idxmin',\n",
       " 'iloc',\n",
       " 'index',\n",
       " 'infer_objects',\n",
       " 'info',\n",
       " 'insert',\n",
       " 'interpolate',\n",
       " 'isin',\n",
       " 'isna',\n",
       " 'isnull',\n",
       " 'items',\n",
       " 'iteritems',\n",
       " 'iterrows',\n",
       " 'itertuples',\n",
       " 'join',\n",
       " 'keys',\n",
       " 'kurt',\n",
       " 'kurtosis',\n",
       " 'last',\n",
       " 'last_valid_index',\n",
       " 'le',\n",
       " 'loc',\n",
       " 'lookup',\n",
       " 'lt',\n",
       " 'mad',\n",
       " 'mask',\n",
       " 'max',\n",
       " 'mean',\n",
       " 'median',\n",
       " 'melt',\n",
       " 'memory_usage',\n",
       " 'merge',\n",
       " 'min',\n",
       " 'mod',\n",
       " 'mode',\n",
       " 'mul',\n",
       " 'multiply',\n",
       " 'ndim',\n",
       " 'ne',\n",
       " 'nlargest',\n",
       " 'notna',\n",
       " 'notnull',\n",
       " 'nsmallest',\n",
       " 'nunique',\n",
       " 'pad',\n",
       " 'pct_change',\n",
       " 'pipe',\n",
       " 'pivot',\n",
       " 'pivot_table',\n",
       " 'plot',\n",
       " 'pop',\n",
       " 'pow',\n",
       " 'prod',\n",
       " 'product',\n",
       " 'quantile',\n",
       " 'query',\n",
       " 'radd',\n",
       " 'rank',\n",
       " 'rdiv',\n",
       " 'reindex',\n",
       " 'reindex_like',\n",
       " 'rename',\n",
       " 'rename_axis',\n",
       " 'reorder_levels',\n",
       " 'replace',\n",
       " 'resample',\n",
       " 'reset_index',\n",
       " 'rfloordiv',\n",
       " 'rmod',\n",
       " 'rmul',\n",
       " 'rolling',\n",
       " 'round',\n",
       " 'rpow',\n",
       " 'rsub',\n",
       " 'rtruediv',\n",
       " 'sample',\n",
       " 'select_dtypes',\n",
       " 'sem',\n",
       " 'set_axis',\n",
       " 'set_flags',\n",
       " 'set_index',\n",
       " 'shape',\n",
       " 'shift',\n",
       " 'size',\n",
       " 'skew',\n",
       " 'slice_shift',\n",
       " 'sort_index',\n",
       " 'sort_values',\n",
       " 'squeeze',\n",
       " 'stack',\n",
       " 'std',\n",
       " 'style',\n",
       " 'sub',\n",
       " 'subtract',\n",
       " 'sum',\n",
       " 'swapaxes',\n",
       " 'swaplevel',\n",
       " 'tail',\n",
       " 'take',\n",
       " 'to_clipboard',\n",
       " 'to_csv',\n",
       " 'to_dict',\n",
       " 'to_excel',\n",
       " 'to_feather',\n",
       " 'to_gbq',\n",
       " 'to_hdf',\n",
       " 'to_html',\n",
       " 'to_json',\n",
       " 'to_latex',\n",
       " 'to_markdown',\n",
       " 'to_numpy',\n",
       " 'to_parquet',\n",
       " 'to_period',\n",
       " 'to_pickle',\n",
       " 'to_records',\n",
       " 'to_sql',\n",
       " 'to_stata',\n",
       " 'to_string',\n",
       " 'to_timestamp',\n",
       " 'to_xarray',\n",
       " 'to_xml',\n",
       " 'transform',\n",
       " 'transpose',\n",
       " 'truediv',\n",
       " 'truncate',\n",
       " 'tz_convert',\n",
       " 'tz_localize',\n",
       " 'unstack',\n",
       " 'update',\n",
       " 'value_counts',\n",
       " 'values',\n",
       " 'var',\n",
       " 'where',\n",
       " 'xs']"
      ]
     },
     "execution_count": 173,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dir(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 199,
   "id": "6863fa43",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>C1</th>\n",
       "      <th>C2</th>\n",
       "      <th>C3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>R1</th>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R2</th>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R3</th>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R4</th>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R5</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    C1  C2  C3\n",
       "R1   1   6  11\n",
       "R2   2   7  12\n",
       "R3   3   8  13\n",
       "R4   4   9  14\n",
       "R5   5  10  15"
      ]
     },
     "execution_count": 199,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 200,
   "id": "8477c848",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "R1    1\n",
       "R2    2\n",
       "R3    3\n",
       "R4    4\n",
       "R5    5\n",
       "Name: C1, dtype: int64"
      ]
     },
     "execution_count": 200,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"C1\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 207,
   "id": "3f94799c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "R1    False\n",
       "R2    False\n",
       "R3     True\n",
       "R4     True\n",
       "R5     True\n",
       "Name: C1, dtype: bool"
      ]
     },
     "execution_count": 207,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"C1\"]>2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 206,
   "id": "f3837837",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>C1</th>\n",
       "      <th>C2</th>\n",
       "      <th>C3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>R3</th>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R4</th>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R5</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    C1  C2  C3\n",
       "R3   3   8  13\n",
       "R4   4   9  14\n",
       "R5   5  10  15"
      ]
     },
     "execution_count": 206,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df[\"C1\"]>2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 210,
   "id": "28fc480f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>C1</th>\n",
       "      <th>C2</th>\n",
       "      <th>C3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>R1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>6</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>7</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R5</th>\n",
       "      <td>5.0</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     C1  C2  C3\n",
       "R1  NaN   6  11\n",
       "R2  NaN   7  12\n",
       "R3  3.0   8  13\n",
       "R4  4.0   9  14\n",
       "R5  5.0  10  15"
      ]
     },
     "execution_count": 210,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df[:]>2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5aa1ec02",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "b77f8d79",
   "metadata": {},
   "source": [
    "### convert Dataframes into array "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7bc1b2e6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 186,
   "id": "20a8f465",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 186,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 183,
   "id": "4a2daed6",
   "metadata": {},
   "outputs": [],
   "source": [
    "arr = df.iloc[:].values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 184,
   "id": "f506676f",
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1,  6, 11],\n",
       "       [ 2,  7, 12],\n",
       "       [ 3,  8, 13],\n",
       "       [ 4,  9, 14],\n",
       "       [ 5, 10, 15]], dtype=int64)"
      ]
     },
     "execution_count": 184,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 185,
   "id": "b7d3ee64",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numpy.ndarray"
      ]
     },
     "execution_count": 185,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(arr)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89fa0076",
   "metadata": {},
   "source": [
    "### 读取CSV文件&获取基本表信息\n",
    "\n",
    "info: Print a concise summary of a DataFrame.\n",
    "\n",
    "\"info()\"函数是一个常用于获取数据对象信息的函数，其具体功能和用途可能因不同的编程语言和环境而异。\n",
    "\n",
    "在Pandas库中，\"info()\"函数可以给出样本数据的相关信息概览，包括行数、列数、列索引、列非空值个数、列类型以及内存占用等。例如，你可以使用data.info()查看数据集的基本信息。\n",
    "\n",
    "此外，在Python标准库的模块inspect中，也有一个同名的\"info()\"函数，它主要用于获取对象的信息，如类、函数和方法等，并可以用来检查对象的类型、方法和属性等信息。\n",
    "\n",
    "需要注意的是，\"info()\"函数并非Excel中的内置函数，但在某些情况下，可能会用到与“INFO”类似的函数来获取特定的系统信息。因此，在使用\"info()\"函数时，需要根据具体的环境和需求来确定其含义和用法。\n",
    "\n",
    "describe：Generate descriptive statistics.\n",
    "\n",
    "\"describe()\"函数是pandas库中的一个方法，主要用于生成数据集的统计描述。它返回给定数据集的主要统计量，例如平均值、标准差、最小值、最大值和四分位数等。\n",
    "\n",
    "在一维数组中，\"describe()\"会返回一系列参数，包括count（数组的个数），mean（数组的平均值），std（数组的标准差），min（最小值），25%（25%分位数，排序之后排在25%位置的值），50%（中位数，排序之后排在50%位置的值），75%（75%分位数，排序之后排在75%位置的值）和max（最大值）。\n",
    "\n",
    "此外，对于DataFrame中的连续值数据，\"describe()\"函数还可以查看基本情况，如每一列非空值的数量，每一列的平均值，每一列的标准差，最小值以及四分位数等。这些信息可以帮助我们快速了解数据集的基本特性，对数据进行有效的统计分析。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 214,
   "id": "6087f858",
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>y</th>\n",
       "      <th>X0</th>\n",
       "      <th>X1</th>\n",
       "      <th>X2</th>\n",
       "      <th>X3</th>\n",
       "      <th>X4</th>\n",
       "      <th>X5</th>\n",
       "      <th>X6</th>\n",
       "      <th>X8</th>\n",
       "      <th>...</th>\n",
       "      <th>X375</th>\n",
       "      <th>X376</th>\n",
       "      <th>X377</th>\n",
       "      <th>X378</th>\n",
       "      <th>X379</th>\n",
       "      <th>X380</th>\n",
       "      <th>X382</th>\n",
       "      <th>X383</th>\n",
       "      <th>X384</th>\n",
       "      <th>X385</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>130.81</td>\n",
       "      <td>k</td>\n",
       "      <td>v</td>\n",
       "      <td>at</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>u</td>\n",
       "      <td>j</td>\n",
       "      <td>o</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>6</td>\n",
       "      <td>88.53</td>\n",
       "      <td>k</td>\n",
       "      <td>t</td>\n",
       "      <td>av</td>\n",
       "      <td>e</td>\n",
       "      <td>d</td>\n",
       "      <td>y</td>\n",
       "      <td>l</td>\n",
       "      <td>o</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>76.26</td>\n",
       "      <td>az</td>\n",
       "      <td>w</td>\n",
       "      <td>n</td>\n",
       "      <td>c</td>\n",
       "      <td>d</td>\n",
       "      <td>x</td>\n",
       "      <td>j</td>\n",
       "      <td>x</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>80.62</td>\n",
       "      <td>az</td>\n",
       "      <td>t</td>\n",
       "      <td>n</td>\n",
       "      <td>f</td>\n",
       "      <td>d</td>\n",
       "      <td>x</td>\n",
       "      <td>l</td>\n",
       "      <td>e</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>13</td>\n",
       "      <td>78.02</td>\n",
       "      <td>az</td>\n",
       "      <td>v</td>\n",
       "      <td>n</td>\n",
       "      <td>f</td>\n",
       "      <td>d</td>\n",
       "      <td>h</td>\n",
       "      <td>d</td>\n",
       "      <td>n</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4204</th>\n",
       "      <td>8405</td>\n",
       "      <td>107.39</td>\n",
       "      <td>ak</td>\n",
       "      <td>s</td>\n",
       "      <td>as</td>\n",
       "      <td>c</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>d</td>\n",
       "      <td>q</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4205</th>\n",
       "      <td>8406</td>\n",
       "      <td>108.77</td>\n",
       "      <td>j</td>\n",
       "      <td>o</td>\n",
       "      <td>t</td>\n",
       "      <td>d</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>h</td>\n",
       "      <td>h</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4206</th>\n",
       "      <td>8412</td>\n",
       "      <td>109.22</td>\n",
       "      <td>ak</td>\n",
       "      <td>v</td>\n",
       "      <td>r</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>g</td>\n",
       "      <td>e</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4207</th>\n",
       "      <td>8415</td>\n",
       "      <td>87.48</td>\n",
       "      <td>al</td>\n",
       "      <td>r</td>\n",
       "      <td>e</td>\n",
       "      <td>f</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>l</td>\n",
       "      <td>u</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4208</th>\n",
       "      <td>8417</td>\n",
       "      <td>110.85</td>\n",
       "      <td>z</td>\n",
       "      <td>r</td>\n",
       "      <td>ae</td>\n",
       "      <td>c</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>g</td>\n",
       "      <td>w</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4209 rows × 378 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        ID       y  X0 X1  X2 X3 X4  X5 X6 X8  ...  X375  X376  X377  X378  \\\n",
       "0        0  130.81   k  v  at  a  d   u  j  o  ...     0     0     1     0   \n",
       "1        6   88.53   k  t  av  e  d   y  l  o  ...     1     0     0     0   \n",
       "2        7   76.26  az  w   n  c  d   x  j  x  ...     0     0     0     0   \n",
       "3        9   80.62  az  t   n  f  d   x  l  e  ...     0     0     0     0   \n",
       "4       13   78.02  az  v   n  f  d   h  d  n  ...     0     0     0     0   \n",
       "...    ...     ...  .. ..  .. .. ..  .. .. ..  ...   ...   ...   ...   ...   \n",
       "4204  8405  107.39  ak  s  as  c  d  aa  d  q  ...     1     0     0     0   \n",
       "4205  8406  108.77   j  o   t  d  d  aa  h  h  ...     0     1     0     0   \n",
       "4206  8412  109.22  ak  v   r  a  d  aa  g  e  ...     0     0     1     0   \n",
       "4207  8415   87.48  al  r   e  f  d  aa  l  u  ...     0     0     0     0   \n",
       "4208  8417  110.85   z  r  ae  c  d  aa  g  w  ...     1     0     0     0   \n",
       "\n",
       "      X379  X380  X382  X383  X384  X385  \n",
       "0        0     0     0     0     0     0  \n",
       "1        0     0     0     0     0     0  \n",
       "2        0     0     1     0     0     0  \n",
       "3        0     0     0     0     0     0  \n",
       "4        0     0     0     0     0     0  \n",
       "...    ...   ...   ...   ...   ...   ...  \n",
       "4204     0     0     0     0     0     0  \n",
       "4205     0     0     0     0     0     0  \n",
       "4206     0     0     0     0     0     0  \n",
       "4207     0     0     0     0     0     0  \n",
       "4208     0     0     0     0     0     0  \n",
       "\n",
       "[4209 rows x 378 columns]"
      ]
     },
     "execution_count": 214,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#path:D:\\零基础入门Python数据分析\\python入门基础\\mercedesbenz.csv\n",
    "pd.read_csv('D:/零基础入门Python数据分析/python入门基础/mercedesbenz.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 215,
   "id": "9c05c4ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "df=pd.read_csv('D:/零基础入门Python数据分析/python入门基础/mercedesbenz.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 216,
   "id": "b7385b52",
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>y</th>\n",
       "      <th>X0</th>\n",
       "      <th>X1</th>\n",
       "      <th>X2</th>\n",
       "      <th>X3</th>\n",
       "      <th>X4</th>\n",
       "      <th>X5</th>\n",
       "      <th>X6</th>\n",
       "      <th>X8</th>\n",
       "      <th>...</th>\n",
       "      <th>X375</th>\n",
       "      <th>X376</th>\n",
       "      <th>X377</th>\n",
       "      <th>X378</th>\n",
       "      <th>X379</th>\n",
       "      <th>X380</th>\n",
       "      <th>X382</th>\n",
       "      <th>X383</th>\n",
       "      <th>X384</th>\n",
       "      <th>X385</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>130.81</td>\n",
       "      <td>k</td>\n",
       "      <td>v</td>\n",
       "      <td>at</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>u</td>\n",
       "      <td>j</td>\n",
       "      <td>o</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>6</td>\n",
       "      <td>88.53</td>\n",
       "      <td>k</td>\n",
       "      <td>t</td>\n",
       "      <td>av</td>\n",
       "      <td>e</td>\n",
       "      <td>d</td>\n",
       "      <td>y</td>\n",
       "      <td>l</td>\n",
       "      <td>o</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>76.26</td>\n",
       "      <td>az</td>\n",
       "      <td>w</td>\n",
       "      <td>n</td>\n",
       "      <td>c</td>\n",
       "      <td>d</td>\n",
       "      <td>x</td>\n",
       "      <td>j</td>\n",
       "      <td>x</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>80.62</td>\n",
       "      <td>az</td>\n",
       "      <td>t</td>\n",
       "      <td>n</td>\n",
       "      <td>f</td>\n",
       "      <td>d</td>\n",
       "      <td>x</td>\n",
       "      <td>l</td>\n",
       "      <td>e</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>13</td>\n",
       "      <td>78.02</td>\n",
       "      <td>az</td>\n",
       "      <td>v</td>\n",
       "      <td>n</td>\n",
       "      <td>f</td>\n",
       "      <td>d</td>\n",
       "      <td>h</td>\n",
       "      <td>d</td>\n",
       "      <td>n</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4204</th>\n",
       "      <td>8405</td>\n",
       "      <td>107.39</td>\n",
       "      <td>ak</td>\n",
       "      <td>s</td>\n",
       "      <td>as</td>\n",
       "      <td>c</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>d</td>\n",
       "      <td>q</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4205</th>\n",
       "      <td>8406</td>\n",
       "      <td>108.77</td>\n",
       "      <td>j</td>\n",
       "      <td>o</td>\n",
       "      <td>t</td>\n",
       "      <td>d</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>h</td>\n",
       "      <td>h</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4206</th>\n",
       "      <td>8412</td>\n",
       "      <td>109.22</td>\n",
       "      <td>ak</td>\n",
       "      <td>v</td>\n",
       "      <td>r</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>g</td>\n",
       "      <td>e</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4207</th>\n",
       "      <td>8415</td>\n",
       "      <td>87.48</td>\n",
       "      <td>al</td>\n",
       "      <td>r</td>\n",
       "      <td>e</td>\n",
       "      <td>f</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>l</td>\n",
       "      <td>u</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4208</th>\n",
       "      <td>8417</td>\n",
       "      <td>110.85</td>\n",
       "      <td>z</td>\n",
       "      <td>r</td>\n",
       "      <td>ae</td>\n",
       "      <td>c</td>\n",
       "      <td>d</td>\n",
       "      <td>aa</td>\n",
       "      <td>g</td>\n",
       "      <td>w</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4209 rows × 378 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        ID       y  X0 X1  X2 X3 X4  X5 X6 X8  ...  X375  X376  X377  X378  \\\n",
       "0        0  130.81   k  v  at  a  d   u  j  o  ...     0     0     1     0   \n",
       "1        6   88.53   k  t  av  e  d   y  l  o  ...     1     0     0     0   \n",
       "2        7   76.26  az  w   n  c  d   x  j  x  ...     0     0     0     0   \n",
       "3        9   80.62  az  t   n  f  d   x  l  e  ...     0     0     0     0   \n",
       "4       13   78.02  az  v   n  f  d   h  d  n  ...     0     0     0     0   \n",
       "...    ...     ...  .. ..  .. .. ..  .. .. ..  ...   ...   ...   ...   ...   \n",
       "4204  8405  107.39  ak  s  as  c  d  aa  d  q  ...     1     0     0     0   \n",
       "4205  8406  108.77   j  o   t  d  d  aa  h  h  ...     0     1     0     0   \n",
       "4206  8412  109.22  ak  v   r  a  d  aa  g  e  ...     0     0     1     0   \n",
       "4207  8415   87.48  al  r   e  f  d  aa  l  u  ...     0     0     0     0   \n",
       "4208  8417  110.85   z  r  ae  c  d  aa  g  w  ...     1     0     0     0   \n",
       "\n",
       "      X379  X380  X382  X383  X384  X385  \n",
       "0        0     0     0     0     0     0  \n",
       "1        0     0     0     0     0     0  \n",
       "2        0     0     1     0     0     0  \n",
       "3        0     0     0     0     0     0  \n",
       "4        0     0     0     0     0     0  \n",
       "...    ...   ...   ...   ...   ...   ...  \n",
       "4204     0     0     0     0     0     0  \n",
       "4205     0     0     0     0     0     0  \n",
       "4206     0     0     0     0     0     0  \n",
       "4207     0     0     0     0     0     0  \n",
       "4208     0     0     0     0     0     0  \n",
       "\n",
       "[4209 rows x 378 columns]"
      ]
     },
     "execution_count": 216,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 217,
   "id": "da94dbba",
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 4209 entries, 0 to 4208\n",
      "Columns: 378 entries, ID to X385\n",
      "dtypes: float64(1), int64(369), object(8)\n",
      "memory usage: 12.1+ MB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 218,
   "id": "6b7ce0b1",
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>y</th>\n",
       "      <th>X10</th>\n",
       "      <th>X11</th>\n",
       "      <th>X12</th>\n",
       "      <th>X13</th>\n",
       "      <th>X14</th>\n",
       "      <th>X15</th>\n",
       "      <th>X16</th>\n",
       "      <th>X17</th>\n",
       "      <th>...</th>\n",
       "      <th>X375</th>\n",
       "      <th>X376</th>\n",
       "      <th>X377</th>\n",
       "      <th>X378</th>\n",
       "      <th>X379</th>\n",
       "      <th>X380</th>\n",
       "      <th>X382</th>\n",
       "      <th>X383</th>\n",
       "      <th>X384</th>\n",
       "      <th>X385</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.0</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "      <td>4209.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>4205.960798</td>\n",
       "      <td>100.669318</td>\n",
       "      <td>0.013305</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.075077</td>\n",
       "      <td>0.057971</td>\n",
       "      <td>0.428130</td>\n",
       "      <td>0.000475</td>\n",
       "      <td>0.002613</td>\n",
       "      <td>0.007603</td>\n",
       "      <td>...</td>\n",
       "      <td>0.318841</td>\n",
       "      <td>0.057258</td>\n",
       "      <td>0.314802</td>\n",
       "      <td>0.020670</td>\n",
       "      <td>0.009503</td>\n",
       "      <td>0.008078</td>\n",
       "      <td>0.007603</td>\n",
       "      <td>0.001663</td>\n",
       "      <td>0.000475</td>\n",
       "      <td>0.001426</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>2437.608688</td>\n",
       "      <td>12.679381</td>\n",
       "      <td>0.114590</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.263547</td>\n",
       "      <td>0.233716</td>\n",
       "      <td>0.494867</td>\n",
       "      <td>0.021796</td>\n",
       "      <td>0.051061</td>\n",
       "      <td>0.086872</td>\n",
       "      <td>...</td>\n",
       "      <td>0.466082</td>\n",
       "      <td>0.232363</td>\n",
       "      <td>0.464492</td>\n",
       "      <td>0.142294</td>\n",
       "      <td>0.097033</td>\n",
       "      <td>0.089524</td>\n",
       "      <td>0.086872</td>\n",
       "      <td>0.040752</td>\n",
       "      <td>0.021796</td>\n",
       "      <td>0.037734</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>72.110000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>2095.000000</td>\n",
       "      <td>90.820000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>4220.000000</td>\n",
       "      <td>99.150000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>6314.000000</td>\n",
       "      <td>109.010000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>8417.000000</td>\n",
       "      <td>265.320000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>8 rows × 370 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                ID            y          X10     X11          X12  \\\n",
       "count  4209.000000  4209.000000  4209.000000  4209.0  4209.000000   \n",
       "mean   4205.960798   100.669318     0.013305     0.0     0.075077   \n",
       "std    2437.608688    12.679381     0.114590     0.0     0.263547   \n",
       "min       0.000000    72.110000     0.000000     0.0     0.000000   \n",
       "25%    2095.000000    90.820000     0.000000     0.0     0.000000   \n",
       "50%    4220.000000    99.150000     0.000000     0.0     0.000000   \n",
       "75%    6314.000000   109.010000     0.000000     0.0     0.000000   \n",
       "max    8417.000000   265.320000     1.000000     0.0     1.000000   \n",
       "\n",
       "               X13          X14          X15          X16          X17  ...  \\\n",
       "count  4209.000000  4209.000000  4209.000000  4209.000000  4209.000000  ...   \n",
       "mean      0.057971     0.428130     0.000475     0.002613     0.007603  ...   \n",
       "std       0.233716     0.494867     0.021796     0.051061     0.086872  ...   \n",
       "min       0.000000     0.000000     0.000000     0.000000     0.000000  ...   \n",
       "25%       0.000000     0.000000     0.000000     0.000000     0.000000  ...   \n",
       "50%       0.000000     0.000000     0.000000     0.000000     0.000000  ...   \n",
       "75%       0.000000     1.000000     0.000000     0.000000     0.000000  ...   \n",
       "max       1.000000     1.000000     1.000000     1.000000     1.000000  ...   \n",
       "\n",
       "              X375         X376         X377         X378         X379  \\\n",
       "count  4209.000000  4209.000000  4209.000000  4209.000000  4209.000000   \n",
       "mean      0.318841     0.057258     0.314802     0.020670     0.009503   \n",
       "std       0.466082     0.232363     0.464492     0.142294     0.097033   \n",
       "min       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "25%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "50%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "75%       1.000000     0.000000     1.000000     0.000000     0.000000   \n",
       "max       1.000000     1.000000     1.000000     1.000000     1.000000   \n",
       "\n",
       "              X380         X382         X383         X384         X385  \n",
       "count  4209.000000  4209.000000  4209.000000  4209.000000  4209.000000  \n",
       "mean      0.008078     0.007603     0.001663     0.000475     0.001426  \n",
       "std       0.089524     0.086872     0.040752     0.021796     0.037734  \n",
       "min       0.000000     0.000000     0.000000     0.000000     0.000000  \n",
       "25%       0.000000     0.000000     0.000000     0.000000     0.000000  \n",
       "50%       0.000000     0.000000     0.000000     0.000000     0.000000  \n",
       "75%       0.000000     0.000000     0.000000     0.000000     0.000000  \n",
       "max       1.000000     1.000000     1.000000     1.000000     1.000000  \n",
       "\n",
       "[8 rows x 370 columns]"
      ]
     },
     "execution_count": 218,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7448a399",
   "metadata": {},
   "source": [
    "### Reading CSV\n",
    "\n",
    "CSV文件，全称为逗号分隔值（Comma-Separated Values）文件，是一种简单、实用的文件格式，用于存储和表示包括文本、数值等各种类型的数据。这种文件格式的一个显著特点是：文件内的数据以逗号 ,分隔，呈现一个表格形式。CSV文件只是一个存储数据的文本文件，它不包含任何格式、公式、宏等。因此，它是一个平面文件，可以用记事本打开。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 225,
   "id": "86257527",
   "metadata": {},
   "outputs": [],
   "source": [
    "from io import StringIO,BytesIO"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 231,
   "id": "55c15a13",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'col1,col2,col3\\nx,y,1\\na,b,2\\nc,d,3'"
      ]
     },
     "execution_count": 231,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(\n",
    "    'col1,col2,col3\\nx,y,1\\na,b,2\\nc,d,3'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 245,
   "id": "5aef7073",
   "metadata": {},
   "outputs": [],
   "source": [
    "data=(\n",
    "    'col1,col2,col3\\n'\n",
    "    'x,y,1\\n'\n",
    "    'a,b,2\\n'\n",
    "    'c,d,3'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 246,
   "id": "34196676",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "str"
      ]
     },
     "execution_count": 246,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 247,
   "id": "16e38972",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>x</td>\n",
       "      <td>y</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>b</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c</td>\n",
       "      <td>d</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col1 col2  col3\n",
       "0    x    y     1\n",
       "1    a    b     2\n",
       "2    c    d     3"
      ]
     },
     "execution_count": 247,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#str以csv文件形式读取\n",
    "df = pd.read_csv(StringIO(data))\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 243,
   "id": "16d20d29",
   "metadata": {},
   "outputs": [],
   "source": [
    "#str以csv文件形式保存\n",
    "#df.to_csv('test.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 248,
   "id": "d5928266",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 3 entries, 0 to 2\n",
      "Data columns (total 3 columns):\n",
      " #   Column  Non-Null Count  Dtype \n",
      "---  ------  --------------  ----- \n",
      " 0   col1    3 non-null      object\n",
      " 1   col2    3 non-null      object\n",
      " 2   col3    3 non-null      int64 \n",
      "dtypes: int64(1), object(2)\n",
      "memory usage: 200.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 249,
   "id": "017356be",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 3 entries, 0 to 2\n",
      "Data columns (total 3 columns):\n",
      " #   Column  Non-Null Count  Dtype \n",
      "---  ------  --------------  ----- \n",
      " 0   col1    3 non-null      object\n",
      " 1   col2    3 non-null      object\n",
      " 2   col3    3 non-null      object\n",
      "dtypes: object(3)\n",
      "memory usage: 200.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "#设置表格索引列数据的数据类型（将原本的str转换成所需要的类型）\n",
    "df=pd.read_csv(StringIO(data),dtype=object)\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 252,
   "id": "fe3bcaef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b  c\n",
       "0  1  2  3\n",
       "1  4  5  6\n",
       "2  7  8  9"
      ]
     },
     "execution_count": 252,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data=(\n",
    "    'a,b,c\\n'\n",
    "    '1,2,3\\n'\n",
    "    '4,5,6\\n'\n",
    "    '7,8,9'\n",
    ")\n",
    "df = pd.read_csv(StringIO(data))\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 253,
   "id": "3a0cbba2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 3 entries, 0 to 2\n",
      "Data columns (total 3 columns):\n",
      " #   Column  Non-Null Count  Dtype\n",
      "---  ------  --------------  -----\n",
      " 0   a       3 non-null      int64\n",
      " 1   b       3 non-null      int64\n",
      " 2   c       3 non-null      int64\n",
      "dtypes: int64(3)\n",
      "memory usage: 200.0 bytes\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 254,
   "id": "b45fb748",
   "metadata": {},
   "outputs": [],
   "source": [
    "#指定列为特定数据类型\n",
    "df=pd.read_csv(StringIO(data),dtype={'b':int,'c':float,'a':'Int64'})#a,b,c是索引列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 255,
   "id": "59cdacc7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b    c\n",
       "0  1  2  3.0\n",
       "1  4  5  6.0\n",
       "2  7  8  9.0"
      ]
     },
     "execution_count": 255,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 256,
   "id": "05d2104e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 3 entries, 0 to 2\n",
      "Data columns (total 3 columns):\n",
      " #   Column  Non-Null Count  Dtype  \n",
      "---  ------  --------------  -----  \n",
      " 0   a       3 non-null      Int64  \n",
      " 1   b       3 non-null      int32  \n",
      " 2   c       3 non-null      float64\n",
      "dtypes: Int64(1), float64(1), int32(1)\n",
      "memory usage: 191.0 bytes\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 258,
   "id": "e206a64e",
   "metadata": {},
   "outputs": [],
   "source": [
    "#设置索引列\n",
    "#index_col参数——False默认生成索引列。True(默认值)每行首元素为索引列。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 259,
   "id": "137190b4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b  c\n",
       "0  1  2  3\n",
       "1  4  5  6\n",
       "2  7  8  9"
      ]
     },
     "execution_count": 259,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data=(#标准形式，每行的尾部没有逗号\n",
    "    'a,b,c\\n'\n",
    "    '1,2,3\\n'\n",
    "    '4,5,6\\n'\n",
    "    '7,8,9'\n",
    ")\n",
    "df = pd.read_csv(StringIO(data))\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 263,
   "id": "44f921cc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b   c\n",
       "1  2  3 NaN\n",
       "4  5  6 NaN\n",
       "7  8  9 NaN"
      ]
     },
     "execution_count": 263,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data=(#非标准形式，每行的尾部有逗号\n",
    "    'a,b,c\\n'\n",
    "    '1,2,3,\\n'\n",
    "    '4,5,6,\\n'\n",
    "    '7,8,9,'\n",
    ")\n",
    "df = pd.read_csv(StringIO(data))\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 265,
   "id": "a6f6406f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b  c\n",
       "0  1  2  3\n",
       "1  4  5  6\n",
       "2  7  8  9"
      ]
     },
     "execution_count": 265,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#非标准字符转换时要写参数\n",
    "data=(#非标准形式，每行的尾部有逗号\n",
    "    'a,b,c\\n'\n",
    "    '1,2,3,\\n'\n",
    "    '4,5,6,\\n'\n",
    "    '7,8,9,'\n",
    ")\n",
    "df = pd.read_csv(StringIO(data),index_col=False)#添加默认索引列\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 268,
   "id": "50e833d4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   b  c\n",
       "0  2  3\n",
       "1  5  6\n",
       "2  8  9"
      ]
     },
     "execution_count": 268,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#usecols参数——指定显示某列\n",
    "\n",
    "pd.read_csv(StringIO(data),usecols=['b','c'],index_col=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 271,
   "id": "88398d6d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a,b\n",
      "\"hello,\\\"Bob\\\",nice to see you\",5\n"
     ]
    }
   ],
   "source": [
    "#引用和转义字符（在 NLP 中非常有用）\n",
    "\n",
    "#escapechar参数-避免某些字符\n",
    "\n",
    "data='a,b\\n\"hello,\\\\\"Bob\\\\\",nice to see you\",5'\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 274,
   "id": "75cd29d7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>hello,\"Bob\",nice to see you</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             a  b\n",
       "0  hello,\"Bob\",nice to see you  5"
      ]
     },
     "execution_count": 274,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_csv(StringIO(data),escapechar='\\\\')#\\\\转义字符"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 275,
   "id": "7dee4654",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ab</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>hello\\Bob\\\"nice to see you\"5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             ab\n",
       "0  hello\\Bob\\\"nice to see you\"5"
      ]
     },
     "execution_count": 275,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_csv(StringIO(data),escapechar=',')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 306,
   "id": "de5ccae2",
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\IPython\\core\\interactiveshell.py:3444: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version.\n",
      "\n",
      "\n",
      "  exec(code_obj, self.user_global_ns, self.user_ns)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>&lt;!DOCTYPE html&gt;&lt;html lang=\"zh\"&gt;&lt;head&gt;&lt;meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"&gt;&lt;meta name=\"viewport\" content=\"width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0\"&gt;&lt;meta http-equiv=\"X-UA-Compatible\" content=\"ie=edge\"&gt;&lt;meta name=\"x5-orientation\" content=\"portrait\"&gt;&lt;meta name=\"format-detection\" content=\"telephone=no\"&gt;&lt;meta http-equiv=\"x-dns-prefetch-control\" content=\"on\"&gt;&lt;title&gt;NBA_百度体育&lt;/title&gt;&lt;link rel=\"dns-prefetch\" href=\"//tiyu.baidu.com\"&gt;&lt;link rel=\"dns-prefetch\" href=\"https://code.bdstatic.com\"&gt;&lt;script src=\"https://code.bdstatic.com/npm/spy-client@2.1.8/dist/spy-client.min.js\" type=\"text/javascript\"&gt;&lt;/script&gt;&lt;script src=\"https://code.bdstatic.com/npm/spy-client@2.1.8/dist/spy-head.min.js\" type=\"text/javascript\"&gt;&lt;/script&gt;&lt;script&gt;const pid = '4_7';</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>const spyHead = __spyHead;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>const spy = new SpyClient({pid, lid: '...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>// 数据类型：异常，触发时间：500MsAfterOnLoad</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>spy &amp;&amp; spy.listenHttpResource(function...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>spy.sendExcept({</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>rtPageAtta...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103</th>\n",
       "      <td>viewEl.set...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>});</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>105</th>\n",
       "      <td>&lt;/script&gt;&lt;/div&gt;&lt;/d...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106</th>\n",
       "      <td>&lt;/body&gt;&lt;/html&gt;</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>107 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    <!DOCTYPE html><html lang=\"zh\"><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"><meta name=\"viewport\" content=\"width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0\"><meta http-equiv=\"X-UA-Compatible\" content=\"ie=edge\"><meta name=\"x5-orientation\" content=\"portrait\"><meta name=\"format-detection\" content=\"telephone=no\"><meta http-equiv=\"x-dns-prefetch-control\" content=\"on\"><title>NBA_百度体育</title><link rel=\"dns-prefetch\" href=\"//tiyu.baidu.com\"><link rel=\"dns-prefetch\" href=\"https://code.bdstatic.com\"><script src=\"https://code.bdstatic.com/npm/spy-client@2.1.8/dist/spy-client.min.js\" type=\"text/javascript\"></script><script src=\"https://code.bdstatic.com/npm/spy-client@2.1.8/dist/spy-head.min.js\" type=\"text/javascript\"></script><script>const pid = '4_7';\n",
       "0                           const spyHead = __spyHead;                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "1            const spy = new SpyClient({pid, lid: '...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "2                     // 数据类型：异常，触发时间：500MsAfterOnLoad                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "3            spy && spy.listenHttpResource(function...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "4                                     spy.sendExcept({                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "..                                                 ...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "102                                      rtPageAtta...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "103                                      viewEl.set...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "104                                                });                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "105                              </script></div></d...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "106                                     </body></html>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "\n",
       "[107 rows x 1 columns]"
      ]
     },
     "execution_count": 306,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#url to csv\n",
    "#https://data.stats.gov.cn/easyquery.htm?cn=C01\n",
    "#https://tiyu.baidu.com/match/NBA/tab/%E7%90%83%E9%98%9F%E6%A6%9C/current/0\n",
    "\n",
    "url=\"https://tiyu.baidu.com/match/NBA/tab/%E6%8E%92%E5%90%8D\"\n",
    "#\n",
    "df=pd.read_csv(url, encoding=\"utf-8\",delimiter=\"\\t\",error_bad_lines=False)\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6948eb9b",
   "metadata": {},
   "source": [
    "### Reading HTML\n",
    "\n",
    "\n",
    "HTML（HyperText Markup Language），中文译为“超文本标记语言”，是构建网页的基础HTML（HyperText Markup Language），中文译为“超文本标记语言”，是构建网页的基础，定义了网页内容的含义和结构。与编程语言不同，HTML是一种标记语言，由一系列的元素组成，这些元素可以用来包围不同部分的内容，使其以某种方式呈现或者工作。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 291,
   "id": "c2947769",
   "metadata": {},
   "outputs": [],
   "source": [
    "url=\"https://www.mydrivers.com/zhuanti/tianti/01/index_gaotong.html#xl700\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 292,
   "id": "bc06d9d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_html(url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 293,
   "id": "2f246196",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[            0                   1\n",
       " 0    查看更多榜单>>            查看更多榜单>>\n",
       " 1  查看手机SoC性能榜          查看手机SoC性能榜\n",
       " 2       更新日期：            2023年11月\n",
       " 3        纠 错：  wenq#mydrivers.com,\n",
       "                         0                       1                       2  \\\n",
       " 0  骁龙800系列 查看价格: 京东商城 爱淘宝  骁龙800系列 查看价格: 京东商城 爱淘宝  骁龙800系列 查看价格: 京东商城 爱淘宝   \n",
       " 1                   处理器型号                    制造工艺                   CPU架构   \n",
       " 2        骁龙800  (MSM8x74)                28nm HPM             四核Krait 400   \n",
       " \n",
       "                         3                       4                       5  \\\n",
       " 0  骁龙800系列 查看价格: 京东商城 爱淘宝  骁龙800系列 查看价格: 京东商城 爱淘宝  骁龙800系列 查看价格: 京东商城 爱淘宝   \n",
       " 1                    核心频率                     GPU                      内存   \n",
       " 2                 2.26GHz      Adreno 330  450MHz          双通道 LPDDR3-800   \n",
       " \n",
       "                         6                       7  \\\n",
       " 0  骁龙800系列 查看价格: 京东商城 爱淘宝  骁龙800系列 查看价格: 京东商城 爱淘宝   \n",
       " 1                      基带                    出货时间   \n",
       " 2          HSPA+、CDMA、LTE                 2013年Q2   \n",
       " \n",
       "                                                    8  \n",
       " 0                             骁龙800系列 查看价格: 京东商城 爱淘宝  \n",
       " 1                                               代表机型  \n",
       " 2  LG G2、三星Galaxy S4 LTE-A、  索尼Xperia Z Ultra、三星G...  ]"
      ]
     },
     "execution_count": 293,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eba2c1d4",
   "metadata": {},
   "source": [
    "### Reading Excel\n",
    "\n",
    "Excel文件是由微软公司开发的电子表格程序Microsoft Excel创建和使用的文件，被誉为行业领先的电子表格Excel文件是由微软公司开发的电子表格程序Microsoft Excel创建和使用的文件，被誉为行业领先的电子表格软件程序，是一款功能强大的数据可视化和分析工具。Excel文件可以保存为多种格式，包括“.xls”、“.xlsx”、“.xlsm”等，其中\".xls\"是二进制格式，是2003版本Office Microsoft Office Excel工作表保存的默认格式；\".xlsx\"的核心结构是XML类型结构，采用了XML的压缩方式，使其占用的空间更小，是Excel2007版本的文件；\".xlsm\"与\".xlsx\"一样是属于07年版本的保存文件，一般情况下Excel不会自动启用“宏”，但是在使用过程中可能会需要用到宏功能，这个时候文件的格式就需要选择\".xlsm\"，这样才能够保存表格中的VBA代码。Excel文件可以被上传到网络上进行编辑和分享，适用于所有平台，包括Windows、Mac、Android和iOS。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 283,
   "id": "3d7579e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_excel = pd.read_excel('Excel_Sample.xlsx')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 284,
   "id": "b400b254",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0   a   b   c   d\n",
       "0           0   0   1   2   3\n",
       "1           1   4   5   6   7\n",
       "2           2   8   9  10  11\n",
       "3           3  12  13  14  15"
      ]
     },
     "execution_count": 284,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_excel"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6ef49b9",
   "metadata": {},
   "source": [
    "### Reading Pickling\n",
    "\n",
    "Pickling文件在Python中，主要是通过pickle模块实现的。这个专用的持久化模块可以把包括自定义类在内的各种数据进行持久化，比较适合用于存储Python本身复杂的数据结构。\n",
    "\n",
    "具体来说，pickle模块可以将Python对象直接保存到文件中，而不需要先将它们转化为字符串再保存，也不需要用到底层的文件访问操作，可以直接将它们写入到一个二进制文件中。此外，pickle模块也会创建一个Python语言专用的二进制格式，使用者在这个过程中无需关心任何文件细节，pickle模块会自动完成读写对象操作。\n",
    "\n",
    "然而，值得注意的是，pickle序列化后的数据可读性较差，人一般无法识别。并且，这种持久化方式只能在Python环境中使用，不能用作与其他语言进行数据交换。\n",
    "\n",
    "总的来说，pickle模块提供了一个简单的持久化功能，它可以把对象以文件的形式存放在磁盘上，以便在需要的时候可以再次加载和使用这些数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 285,
   "id": "cdf29070",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_excel.to_pickle(\"df_excel_pickle\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 287,
   "id": "95325895",
   "metadata": {},
   "outputs": [],
   "source": [
    "df=pd.read_pickle(\"df_excel_pickle\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 288,
   "id": "671172ee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0   a   b   c   d\n",
       "0           0   0   1   2   3\n",
       "1           1   4   5   6   7\n",
       "2           2   8   9  10  11\n",
       "3           3  12  13  14  15"
      ]
     },
     "execution_count": 288,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46414de3",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
